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Abstract. The 2dF Galaxy Redshift Survey has already yielded over 135,000 
galaxy spectra (out of the planned 250,000). We present a method for the pa- 
rameterisation of galaxy spectra in the 2dF and other large redshift surveys using 
a new continuous parameter r\ = 0.44pci — pc2, based on a Principal Components 
Analysis (where pc\ and pc2 are the projections onto the first and second principal 
components respectively). This parameter is designed to be robust to instrumental 
uncertainties and it represents the absorption/emission strength of a galaxy. 

1 The 2dF Galaxy Redshift Survey 

The 2dF Galaxy Redshift Survey § is a combined UK- Australian effort to 
map the distribution of galaxies brighter than bj = 19.45 (with median red- 
shift z ~ 0.1). In so doing it will obtain a total of 250,000 galaxy spectra 
from which redshifts will be determined, a factor of 10 more than any previ- 
ous survey. 

Crucial to extracting the full scientific content of the survey is an accurate 
characterisation of the galaxy population in terms of spectral features. To this 
end we address the problem of obtaining an accurate and meaningful spectral 
classification that can be used in future analyses e.g. of luminosity functions 
and correlation functions. 

2 Principal Components Analysis 

Principal Components Analysis (PCA) allows us to easily visualise overall 
trends (if any) within a multi-dimensional population Q Q Q. It does this 
by reducing the dimensionality of the parameter space (here spectral bins) 
down to just those components which are the most informative (in the sense 
of maximum variance). Any clustering in the space defined by the PCA is 
indicative of distinct subpopulations within the sample. 

In terms of reduced dimensionality we find that when we apply PCA to the 
first 100,000 galaxy spectra obtained in the 2dFGRS that rather than using 
the original 738 spectral channels to describe each spectrum, we can use only 
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three projections and still retain two thirds of the total variance within the 
population Q . The significance of each successive principal component drops 
off very sharply so these first few components are by far the most important. 

2.1 PCA on Volume-Limited Samples 

Due to the large number of galaxy spectra acquired so far in the 2dFGRS it 
has now become possible to consider volume-limited subsets of the galaxies 
for the PCA. This approach is more physically motivated compared to con- 
sidering the entire flux-limited data set in which the spectral population can 
be biased by relationships between magnitude and spectral type. 

Wc find that using an absolute Bj magnitude cut of -17.0 (corresponding 
to a redshift z ~ 0.06) gives a representative subset of the local population. 
We therefore perform the PCA on this subset of the population to determine 
the principal components which are then used to determine projections for 
the remaining galaxies (Fig. |l|). 
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Fig. 1. Projections in the pci,pc2 space for the galaxies observed to date in the 
2dFGRS. The trends indicated have been derived from the eigenspectra by iden- 
tifying linear combinations that either maximise or minimise the influence of the 
absorption/emission features. 

2.2 Identifying Trends 

We can identify general trends in the distribution of projections shown in 
Fig. p] by considering the physical significance of the first two eigenspec- 
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tra (Fig. g). It can be clearly seen that whilst the first eigenspectrum con- 
tains both a continuum component and a measure of the average emis- 
sion/absorption strength, the second contains only the latter (to a first ap- 
proximation). Therefore by taking certain linear combinations of the two we 
can either maximise or minimise the contribution from the line features in 
the spectrum. By performing such an analysis we can conclude that the dis- 
tribution of Fig. [l] is indicative of a colour sequence from red (bottom left) 
to blue (top right) and an orthogonal sequence of absorption (top left) to 
emission (bottom right). 
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Fig. 2. The first two eigenspectra are shown in the top two panels and the lin- 
ear combinations which either minimise (a) or maximise (b) the effect of emis- 
sion/absorption features are shown in the bottom two. 



3 Instrumental Uncertainties 

Unfortunately whilst carrying out observations using the 2dF instrument al- 
lows us to collect an unprecedented number of spectra simultaneously and 
hence determine many redshifts in a small amount of time, we must make a 
compromise with our spectral quality. This is deemed to be a necessary sac- 
rifice in view of carrying out a project as ambitious as the 2dFGRS, however 
if one wishes to characterise the resultant population of galaxies using their 
spectral properties care must be taken in order to ensure that the classifica- 
tion scheme is robust to instrumental uncertainties. 

The 2dF instrument makes use of up to 400 fibre optic cables with a 
diameter corresponding to (depending on plate position) ~2.0" on the sky B. 
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The quality of the spectra can be compromised in many ways, the most 
important of which are: 

• Astrometric errors in determining the actual position of the galaxy on the 
sky can result in the fibre being placed in a position that isn't optimal 
for light collection. In general the position of a galaxy centre is known to 
within ~ 0.4" . In addition to this there is a random positioning error of 
the order of -0.1" RMS. 

• The observed colour (continuum slope) of the galaxy depends on the 
positioning of the fibre aperture. This is due to small errors in the 2dF 
corrector which result in relative spatial displacements between different 
components of a galaxies spectrum and can be as large as > 1" (at radius 
of 30') between the incident blue and red components of the observed 
spectrum. 

• The apparent size of a galaxy on the sky can be larger than the fibre 
aperture (which corresponds to — 2.5ft- -1 kpc at the surveys median 
redshift of z = 0.1 in an Einstein - de Sitter Universe), the positioning 
of this aperture can therefore result in an unrepresentative spectrum due 
to colour gradients within the galaxy. 

Whilst the effect of these errors at first seems large one must bear in 
mind that the average seeing at the 2dF is of the order of — 1.5", enough 
to significantly smooth out most of the errors. The distortions mentioned in 
the second point result in unstable continuum measurements for the observed 
galaxies. 

4 Spectral Classification 

We are presented with several options in attempting to derive a classification 
for the the observed 2dF galaxies based upon their spectra. However there 
are several issues that one must first consider; 

• The distribution of galaxies in the (pci,pc2) plane is smooth, representing 
a continuous sequence from absorption to emission and from red to blue. 
There is no evidence to suggest any division into distinct classes of galaxy 
spectra. 

• The instrumental errors result in unstable continuum measurements. On 
the other hand small scale features such as line-strengths are relatively 
unaffected. 

• The relationship between morphology and spectral properties is not very 
well understood and hence attempting to anchor the classification using 
a morphological training set would be premature. A more robust and 
quantifiable measure of spectral properties is required. 
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4.1 r\ Parameterisation 

By projecting the pc\ and pci components of each galaxy onto the linear 
combination which maximises the effect of emission/absorption features we 
are in effect high-pass filtering the spectra. Thus we would expect (and indeed 
we find) this projection to be relatively stable to uncertainties in continuum 
measurements . 

By using this projection we are determining a measure of the average 
emission/absorption strength of a galaxy which is easily quantifiable and ro- 
bust. In addition this projection is also representative of the spectral sequence 
of the galaxy population since it is composed of the most significant principal 
components. 

We therefore choose to adopt this projection, which we shall denote 77, as 
our continuous measure of spectral type: 

T] = a.pci-pc 2 (1) 

The value of a which maximises the emission/absorption features is found 
to be a = 0.44 ±0.06. 

In Fig. H the distribution of the 77 projections are shown for the galaxies 
observed to date in the 2dFGRS. Also shown in the same figure is the 77- 
morphology relation for a sample of galaxies from the Kennicutt Atlas 0. 
Comparing the two data sets shows that there is a correspondence between 
the sequence of 77 and that of morphology. Note that this correspondence can 
only be treated as an approximation since our sample of Kennicutt galaxies 
is not complete and only represents a few high Signal-to- Noise spectra with 
well determined morphologies. 



5 Summary 

We have attempted to derive a meaningful and robust parameterisation of the 
observed spectral distribution in order to characterise the galaxy population 
of the 2dFGRS. In so doing we have defined a new variable 77 which represents 
the most significant component of the galaxy sequence that is unaffected by 
instrumental uncertainties, but still retains physical information. 

The parameter 77 is easily interpreted as an averaged absorption/emission 
strength and can be shown to have significant correlations with more con- 
ventional classifiers such as morphology and the Ha equivalent width. 

There are certain limitations to our analysis, the most notable of which is 
that we have only derived a classification in the broadest sense. We have not 
attempted to identify the distinct sub-populations that are known to exist 
buried within the galaxy sequence, however there is no reason that such an 
analysis cannot be performed on the data in the future. Other significant 
limitations to our analysis are the assumption that each spectrum can be 
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Fig. 3. Histogram (top) of the r\ = 0.44pci — pc2 projections for the galaxies ob- 
served to date in the 2dFGRS. Also shown (below) is the ^-morphology relation for 
a sample of galaxies from the Kennicutt Atlas B. 



reconstructed using a linear combination of only a few of the derived eigen- 
spectra and the restriction to variance as a measure of information (both of 
which are implicitly assumed in the PC A). These assumptions greatly sim- 
plify the analysis but are known to not be true in general. It is in fact possible 
to extend the analysis without the use of these assumptions to perform an 
Independent Component Analysis. Another example of a non-linear method 
that has recently been investigated is that of an information-based clustering 
algorithm 0. Preliminary results from such analyses look promising. 
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